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The objective of this study is to create a Thai-language chatbot analyzing 
mosquito-borne diseases using Jaccard similarity with an aim to develop an 
artificial intelligence (AI)-based chatbot used to analyze Aedes-borne 
diseases through natural language processing. The analysis occurred when 
the symptoms provided by users on the chatbot were assessed to select 
relevant words as text attributes using the term frequency-inverse document 
frequency (TF-IDF) before the Jaccard similarity was used to measure the 
similarity of the information on the mosquito-borne disease database. The 
Line Messaging API was applied to facilitate communication between users 
and the chatbot through the Line application. The chatbot applied PHP 
7.2.34 and MySQL 5.7.32 for database management, with Apache 2.2.29 
serving as the bot server. The performance evaluation of the chatbot revealed 
that the chatbot accurately understood user intentions with an intent accuracy 
of 85.00%. Likewise, the usability of the chatbot was assessed using the 
system usability scale (SUS), and it received a score of 89.75, indicating a 
high level of user-friendliness. Furthermore, it has been found that 
appropriate tokenization enables accurate feature selection. This leads to 
improved accuracy in measuring Jaccard similarity. Consequently, the 
chatbot is capable of providing precise responses that align with the user's 
intent. 
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1. INTRODUCTION 


Dengue outbreaks are occurring across the world resulting in increasing numbers of patients, 
particularly in countries with tropical and subtropical climates [1]. It is a disease caused by the dengue virus 
which is transmitted by Aedes mosquitoes which accelerate the rapid spread of the disease [2], [3]. In 
addition, Aedes are also the vector of Zika, Chikungunya, and other virus infections, but dengue exclusively 
accounts for approximately 390 million dengue cases annually [4]. Thus, dengue is a global health threat with 
significant social and economic impact. Although the disease is endemic in more than 100 countries in 
Southeast Asia, the Americas, the Western Pacific, Africa and the Eastern Mediterranean region [5], 
according to the World Health Organization, it is estimated that between 2.5 and 3 billion people worldwide 
live in dengue-endemic areas [6], posing a risk of getting dengue. Nowadays, there is no specific treatment 
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for dengue and its vaccines can only provide symptomatic treatment [7], [8]. Therefore, early detection of the 
disease allows healthcare professionals to deliver timely medical treatment and it also lowers the mortality 
rates below 1%. A patient with dengue suffers flu-like symptoms [4] with a high fever of 40 °C or 104 °F and 
are usually accompanied with at least two of the following symptoms: i) headache; ii) pain behind the eyes; 
iii) nausea and vomiting; iv) glandular swelling; v) joint, bone or muscle pain; and vi) rash [9]. 

For patients with severe symptoms, there will be a critical period of about 3-7 days after the onset of the 
illnesses; the fever will not go away with the symptoms including: i) severe abdominal pain; ii) persistent 
vomiting; iii) bleeding gums; iv) vomiting blood; v) rapid breathing; and vi) fatigue and restlessness [9]. When 
critical dengue is suspected, seek medical treatment as soon as possible to prevent risks of plasma leaking, 
circulatory failure or severe bleeding, shortness of breath, and organ failure [10]. These severe symptoms result in 
hypovolemic shock and lead to a risk of death [9], when the patient does not receive timely medical treatment. 

In Thailand, a country in Southeast Asia, dengue outbreaks occur throughout the nation due to the 
favorable climate for breeding Aedes mosquitoes. As a result, dengue fever poses a significant threat [11], 
[12] to the health of the population in the country. Furthermore, this illness also impacts the public healthcare 
system and the nation's economy. 

Therefore, the researcher aims to develop a Thai-language chatbot for analyzing Aedes-borne 
diseases using Jaccard similarity. The artificial intelligence (AI)-based chatbot uses natural language 
processing technology to analyze users messages in order to diagnose diseases by selecting the word features 
in the text through an integration between the term frequency-inverse document frequency (TF-IDF) method 
and the Jaccard similarity measurement in the Aedes-borne disease database. The development of the chatbot 
incorporates the Line Messaging API for user-system communication through the Line application. The 
chatbot is developed using PHP 7.2.34 and utilizes MySQL 5.7.32 for database management, with Apache 
2.2.29 serving as the bot server. The contributions of this chatbot development can be beneficial in the fields 
of medical services and public health, as well as serve as a guideline for enhancing various service sectors in 
the future. 


2. METHOD 
2.1. Artificial intelligence 

AI refers to technology that mimics human intelligence [13], using sophisticated mathematical 
algorithms to process data and produce results [14] without requiring a new command for each task. 
Nowadays, AI, such as chatbots, facial recognition systems, virtual assistants, and more, plays a crucial role 
in our daily lives by performing various tasks for humans. Due to its ability for automatic learning, AI 
enables processing, analysis, planning, and decision-making similar to humans. 


2.2. Natural language processing 

Natural language processing is a branch of AI technology that leverages knowledge from various 
fields [15], such as linguistics, computer science, and statistics, to analyze the language humans use in daily 
communication. It enables computers to understand humans' intentions and meanings in their 
communication. Natural language processing technology can be applied in various domains, including 
healthcare, education, and business. 


2.3. Chatbot 

Chatbot is a computer program developed for communicating with humans through naturally 
occurring everyday language used to communicate in our everyday life. The program simulates conversations 
with human users the same way they interact with human beings. This program can be used in real-time, 
24 hours a day [16], [17]. Chatbots are divided into 2 main types including: i) rule-based chatbots, which is a 
chatbot developed to process and understand human conversations according to predefined rules and 
conditions [18]; however, when a question is outside the predefined rules or conditions, the program will not 
be able to answer it or provide a wrong answer [19], and ii) AI-based chatbot or intelligent bot, a chatbot 
developed with natural language processing that understands natural language or the language that humans 
use to communicate in everyday life without relying on any predefined answers [20]. Therefore, the program 
comprehends the user’s intent and provides correct answer to the question. 


2.4. Line Messaging API 

Line Messaging API is a communication channel between Line application users and service 
providers. Messages are received and sent between the servers of the parties via the Line Platform, enabling 
the development of chatbots for interactive messaging with users. This API provides a seamless and efficient 
means of connecting users with various services and facilitating real-time communication, as shown in 
Figure 1. 
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Figure 1. Line Messaging API working processes 


2.5. Architecture of the Thai-language chatbot analyzing mosquito-borne diseases using Jaccard 
similarity 

This study aims to develop a chatbot using PHP 7.2.34 and MySQL 5.7.32 to manage the database 
with Apache 2.2.29 functioning as the bot server. The server and Line Messaging API allow users to 
communicate via Line application. The architecture of the system is shown in Figure 2. The figure shows the 
working processes of Line Messaging API, a communication channel between Line application users and 
chatbot on Line Platform. Bot server must be connected to the Line platform, which transmits data between 
the server and the Line application users in JSON FORMAT. 


Chatbot Process 


Message 


Message Message 
———— —__ 
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+ ~~ 


Database 


User Line Application Bot Server 


Figure 2. The architecture of the chatbot 


2.6. Chatbot development process 
This research aims to develop a Thai-language chatbot analyzing mosquito-borne diseases using 

Jaccard similarity in which the working processes of the chatbot can be divided into 6 steps as: 

a. Tokenization is the process of splitting a text object into smaller units [21], [22] in order to determine the 
boundaries. This study applied the longest word pattern matching technique in which the longest 
matching word in a string is separated by comparing it with the dictionary. 

b. Stop word is the process of removing insignificant words like prepositions, pronouns, conjunctions, and 
interjections, from a text. When these words are removed, the meaning will not be affected, and it also 
helps reduce the size of the text [23]—-[25]. 

c. Stemming word is the process of replacing words with the same root or words with the same meaning by 
the same token [26], [27]. 

d. Feature selection is the process of selecting words that are significant to the text. In this research, feature 
selection was performed by using TF-IDF. 
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e. Similarity measurement is the process of measuring the similarity of users messages through the Jaccard 
similarity in the Aedes-borne disease database. 
f. Responds is the process of displaying the processing output to users. 


2.7. Term frequency-inverse document frequency 

TF-IDF is a mathematical algorithm used to calculate the weight of significant words in a text [28], 
[29] on the assumption that a word appearing frequently in the text is usually with high term frequency (TF), 
while a low document frequency (DF) can be found when a word does not appear in any other texts, but with 
a high inverse document frequency (IDF) as seen in (1) for word weight calculation [30], [31]. 


thy =- a) 


Èknkj 
where nj; is the frequency of the word t; in statement D, and Xx nx; is the sum of the frequencies of all 
words appearing in statement D [30]. 
|D| 


where |D] is the total number of words in the corpus and |{j: t; € d;}| is the number of the word t; appearing 
in all statements in the corpus [32]. 


TF — IDF = tfiy x idf; (3) 


2.8. Jaccard similarity 

Jaccard similarity is a statistical method used to measure similarity or coefficients. The method, used 
to measure the similarity between sets, is calculated by dividing the intersection of set A and B by the union 
of both data sets, with the result between 0-1, where 0 refers to no similarities and 1 appears when they are 
similar [33]—[35]. Jaccard similarity can be calculated through (4): 


|ANB| _ |ANB| 
|AUB| _ |A|+|B|-|AnB] 


J(A, B) = 


(4) 


2.9. Measuring the chatbot efficiency 

Chatbot efficiency measurement refers to the accuracy evaluation of the chatbot to examine whether 
the interaction with users is effective enough for it to meet the needs or desire of users. In this research, 
purposive sampling was applied in selecting 10 experts that including 5 people of information technology 
experts and 5 people of medical specialists. The experts asked 120 questions to measure the chatbot 
efficiency, which can be calculated through (5) [36]: 


Number of intents correctly indentified 
Intent Accuracy = i z i 


(5) 


Total number of intents 


2.10. The chatbot usability assessment 

The chatbot usability was tested using the system usability scale (SUS), a questionnaire used in 
evaluating usability of an application developed by John Brooke in 1986 [37] including 10 items as shown in 
Table 1. Each of them is measured at 5 levels of satisfaction using likert Scale [38] as shown in Table 2. The 
score for the strongly agree is equal to 5, while the strongly disagree represents the score of 1 [39], [40]. In 
this research, the 10 experts responding to the questionnaire were purposively selected applying Purposive 
Sampling. The experts consisted of 5 people of information technology experts and 5 people of medical 
specialists. The calculation of the score from the questionnaire can be conducted through (6): 


SUS Score = X +Y x 2.5 (6) 
In which: 


X=the sum of the scores of all odd-numbered questions minus 5. 
Y=25 minus the sum of all even-numbered questions. 
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Table 1. Questions in the chatbot usability assessment form using the SUS 
Questions 

I would like to use this chatbot often. 

I think chatbots shouldn't be unnecessarily complicated. 

I think this chatbot is user-friendly. 

I think technical support is required to use this chatbot. 

I've found many functions of the chatbot work well. 

I think some functions of this chatbot are inconsistent. 

I think most people can quickly learn to use this chatbot. 

I find using this chatbot very cumbersome. 

I feel confident using this chatbot. 

I need to learn a lot before I can use this chatbot. 


= 
Seren daAnAwn |Z 


Table 2. Description of the score from the SUS calculated from (6) 
Scores Meanings 
> 80.3 Excellent usability 
68-80.3 Good usability 
67 Moderate usability 
51-66 Low usability 
<51 Limited usability 


3. RESULTS AND DISCUSSION 

The output of the development of a Thai-language chatbot analyzing mosquito-borne diseases using 
Jaccard similarity processing is shown in Figure 3. The Figure 3(a) displays the use of Rich menu, the menu 
that facilitates the users. It also Figure 3(b) outlines an example of a natural language interaction between a 
user and the chatbot. The intents confusion matrix was employed to evaluate the intent accuracy of Aedes- 
borne disease analysis in the performance evaluation of the Thai-language chatbot analyzing mosquito-borne 
diseases using Jaccard similarity. The results are presented in Table 3. 
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= 
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Figure 3. A Thai-language chatbot analyzing mosquito-borne diseases using Jaccard similarity, (a) an 
example of a Rich menu and (b) an example user-chatbot interaction 


Table 3 shows the performance evaluation results of the Thai-language chatbot analyzing mosquito- 
borne diseases using Jaccard similarity. It can be concluded that the chatbot achieved an intent accuracy of 
85.00%. This indicates that the chatbot effectively and accurately analyzes Aedes-borne diseases in line with 
the users intentions. Figure 4 shows the results of the usability test of the Thai-language chatbot analyzing 
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mosquito-borne diseases using Jaccard similarity through the SUS. The mean score of the usability was 89.75 
referring to excellent usability. This indicates that the chatbot was user-friendly and straightforward. Due to 
the fact that the respondents were already familiar with using Line application in their daily life, it was easy 
for them to understand the functionality of the Thai-language chatbot without any need to additionally learn 
or study how to use the program. 


Table 3. The intents confusion matrix of the Thai-language chatbot analyzing mosquito-borne diseases using 
Jaccard similarity 


True class 
Intents class Dengue virus Zika virus Chikungunya fever Other 
Dengue virus 27 1 1 1 
Zika virus 1 26 2 1 
Chikungunya fever 2 3 24 1 
Other 2 2 1 25 


System Usability Scale (SUS) 


SCORE 


QUESTIONS 


Figure 4. The usability testing result of the Thai-language chatbot analyzing mosquito-borne diseases 
using Jaccard similarity through the SUS 


4. CONCLUSION 

This research focuses on develop a Thai-language chatbot analyzing mosquito-borne diseases using 
Jaccard similarity. It utilizes natural language processing to understand the intentions of users interacting 
with the chatbot. The selection of text attributes from a text is done using the TF-IDF before the Jaccard 
similarity is then used to measure similarity against a database of mosquito-borne diseases, providing 
appropriate responses to users. The chatbot is developed using PHP 7.2.34 and MySQL 5.7.32 for database 
management in which Apache 2.2.29 operates as a bot server using and incorporates the Line Messaging API 
for communication with users via the Line application. The research findings indicated that the Thai- 
language chatbot achieved an intent accuracy of 85.00%, accurately capturing user intentions. The SUS 
assessment also indicated a high usability score of 89.75, demonstrating that the chatbot was user-friendly. 

Therefore, it can be concluded that the Thai-language chatbot, which analyzes mosquito-borne 
diseases using Jaccard similarity, provides accurate and user-friendly interactions. The research found that 
the performance of the chatbot in engaging in conversations and providing precise responses aligns correctly 
with the user's intentions. This performance relies on tokenization to define appropriate boundaries for 
morphemes, as the Thai language lacks clear word boundaries. Therefore, when appropriate boundaries are 
assigned to the words, it impacts the accuracy of term weighting for feature selection using TF-IDF and the 
measurement of Jaccard similarity, resulting in increased accuracy. 
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